481 research outputs found

    Representation of probabilistic scientific knowledge

    Get PDF
    This article is available through the Brunel Open Access Publishing Fund. Copyright Ā© 2013 Soldatova et al; licensee BioMed Central Ltd.The theory of probability is widely used in biomedical research for data analysis and modelling. In previous work the probabilities of the research hypotheses have been recorded as experimental metadata. The ontology HELO is designed to support probabilistic reasoning, and provides semantic descriptors for reporting on research that involves operations with probabilities. HELO explicitly links research statements such as hypotheses, models, laws, conclusions, etc. to the associated probabilities of these statements being true. HELO enables the explicit semantic representation and accurate recording of probabilities in hypotheses, as well as the inference methods used to generate and update those hypotheses. We demonstrate the utility of HELO on three worked examples: changes in the probability of the hypothesis that sirtuins regulate human life span; changes in the probability of hypotheses about gene functions in the S. cerevisiae aromatic amino acid pathway; and the use of active learning in drug design (quantitative structure activity relation learning), where a strategy for the selection of compounds with the highest probability of improving on the best known compound was used. HELO is open source and available at https://github.com/larisa-soldatova/HELO.This work was partially supported by grant BB/F008228/1 from the UK Biotechnology & Biological Sciences Research Council, from the European Commission under the FP7 Collaborative Programme, UNICELLSYS, KU Leuven GOA/08/008 and ERC Starting Grant 240186

    Dissecting schizophrenia phenotypic variation:the contribution of genetic variation, environmental exposures, and geneā€“environment interactions

    Get PDF
    Schizophrenia is among the leading causes of disability worldwide. Prior studies have conclusively demonstrated that the etiology of schizophrenia contains a strong genetic component. However, the understanding of environmental contributions and geneā€“environment interactions have remained less well understood. Here, we estimated the genetic and environmental contributions to schizophrenia risk using a unique combination of data sources and mathematical models. We used the administrative health records of 481,657 U.S. individuals organized into 128,989 families. In addition, we employed rich geographically specific measures of air, water, and land quality across the United States. Using models of progressively increasing complexity, we examined both linear and non-linear contributions of genetic variation and environmental exposures to schizophrenia risk. Our results demonstrate that heritability estimates differ significantly when geneā€“environment interactions are included in the models, dropping from 79% for the simplest model, to 46% in the best-fit model which included the full set of linear and non-linear parameters. Taken together, these findings suggest that environmental factors are an important source of explanatory variance underlying schizophrenia risk. Future studies are warranted to further explore linear and non-linear environmental contributions to schizophrenia risk and investigate the causality of these associations

    Centralized scientific communities are less likely to generate replicable results

    Get PDF
    Concerns have been expressed about the robustness of experimental findings in several areas of science, but these matters have not been evaluated at scale. Here we identify a large sample of published drug-gene interaction claims curated in the Comparative Toxicogenomics Database (for example, benzo(a)pyrene decreases expression of SLC22A3) and evaluate these claims by connecting them with high-throughput experiments from the LINCS L1000 program. Our sample included 60,159 supporting findings and 4253 opposing findings about 51,292 drug-gene interaction claims in 3363 scientific articles. We show that claims reported in a single paper replicate 19.0% (95% confidence interval [CI], 16.9ā€“21.2%) more frequently than expected, while claims reported in multiple papers replicate 45.5% (95% CI, 21.8ā€“74.2%) more frequently than expected. We also analyze the subsample of interactions with two or more published findings (2493 claims; 6272 supporting findings; 339 opposing findings; 1282 research articles), and show that centralized scientific communities, which use similar methods and involve shared authors who contribute to many articles, propagate less replicable claims than decentralized communities, which use more diverse methods and contain more independent teams. Our findings suggest how policies that foster decentralized collaboration will increase the robustness of scientific findings in biomedical research

    Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users

    Get PDF
    Motivation: Much current research in biomedical text mining is concerned with serving biologists by extracting certain information from scientific text. We note that there is no ā€˜average biologistā€™ client; different users have distinct needs. For instance, as noted in past evaluation efforts (BioCreative, TREC, KDD) database curators are often interested in sentences showing experimental evidence and methods. Conversely, lab scientists searching for known information about a protein may seek facts, typically stated with high confidence. Text-mining systems can target specific end-users and become more effective, if the system can first identify text regions rich in the type of scientific content that is of interest to the user, retrieve documents that have many such regions, and focus on fact extraction from these regions. Here, we study the ability to characterize and classify such text automatically. We have recently introduced a multi-dimensional categorization and annotation scheme, developed to be applicable to a wide variety of biomedical documents and scientific statements, while intended to support specific biomedical retrieval and extraction tasks

    Looking at Cerebellar Malformations through Text-Mined Interactomes of Mice and Humans

    Get PDF
    WE HAVE GENERATED AND MADE PUBLICLY AVAILABLE TWO VERY LARGE NETWORKS OF MOLECULAR INTERACTIONS: 49,493 mouse-specific and 52,518 human-specific interactions. These networks were generated through automated analysis of 368,331 full-text research articles and 8,039,972 article abstracts from the PubMed database, using the GeneWays system. Our networks cover a wide spectrum of molecular interactions, such as bind, phosphorylate, glycosylate, and activate; 207 of these interaction types occur more than 1,000 times in our unfiltered, multi-species data set. Because mouse and human genes are linked through an orthological relationship, human and mouse networks are amenable to straightforward, joint computational analysis. Using our newly generated networks and known associations between mouse genes and cerebellar malformation phenotypes, we predicted a number of new associations between genes and five cerebellar phenotypes (small cerebellum, absent cerebellum, cerebellar degeneration, abnormal foliation, and abnormal vermis). Using a battery of statistical tests, we showed that genes that are associated with cerebellar phenotypes tend to form compact network clusters. Further, we observed that cerebellar malformation phenotypes tend to be associated with highly connected genes. This tendency was stronger for developmental phenotypes and weaker for cerebellar degeneration
    • ā€¦
    corecore